Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells2614518
Missing cells (%)38.6%
Duplicate rows309470
Duplicate rows (%)91.4%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Categorical8
Numeric12

Warnings

Dataset has 309470 (91.4%) duplicate rows Duplicates
PaySanrenpukuNinki3 is highly correlated with PaySanrentanKumi1 and 12 other fieldsHigh correlation
PaySanrentanKumi1 is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
PaySanrentanPay1 is highly correlated with PaySanrenpukuNinki3 and 4 other fieldsHigh correlation
PaySanrentanNinki1 is highly correlated with PaySanrenpukuNinki3 and 4 other fieldsHigh correlation
PaySanrentanKumi2 is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
PaySanrentanPay2 is highly correlated with PaySanrenpukuNinki3 and 4 other fieldsHigh correlation
PaySanrentanNinki2 is highly correlated with PaySanrenpukuNinki3 and 4 other fieldsHigh correlation
PaySanrentanKumi3 is highly correlated with PaySanrenpukuNinki3 and 12 other fieldsHigh correlation
PaySanrentanPay3 is highly correlated with PaySanrenpukuNinki3 and 12 other fieldsHigh correlation
PaySanrentanNinki3 is highly correlated with PaySanrenpukuNinki3 and 12 other fieldsHigh correlation
YoubiCD is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
JyuryoCD is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
Kyori is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
TrackCD is highly correlated with PaySanrenpukuNinki3 and 3 other fieldsHigh correlation
KyoriBefore is highly correlated with PaySanrentanPay3 and 4 other fieldsHigh correlation
PaySanrentanPay3 is highly correlated with KyoriBefore and 6 other fieldsHigh correlation
JyuryoCD is highly correlated with PaySanrentanPay3 and 3 other fieldsHigh correlation
PaySanrentanKumi3 is highly correlated with KyoriBefore and 6 other fieldsHigh correlation
PaySanrenpukuNinki3 is highly correlated with KyoriBefore and 6 other fieldsHigh correlation
PaySanrentanNinki3 is highly correlated with KyoriBefore and 6 other fieldsHigh correlation
KigoCD is highly correlated with PaySanrentanPay3 and 3 other fieldsHigh correlation
GradeCD is highly correlated with KyoriBefore and 4 other fieldsHigh correlation
PaySanrenpukuNinki3 has 338561 (> 99.9%) missing values Missing
PaySanrentanKumi2 has 336583 (99.4%) missing values Missing
PaySanrentanPay2 has 336583 (99.4%) missing values Missing
PaySanrentanNinki2 has 336583 (99.4%) missing values Missing
PaySanrentanKumi3 has 338561 (> 99.9%) missing values Missing
PaySanrentanPay3 has 338561 (> 99.9%) missing values Missing
PaySanrentanNinki3 has 338561 (> 99.9%) missing values Missing
GradeCD has 250525 (74.0%) missing values Missing
TokuNum has 322947 (95.4%) zeros Zeros
Nkai has 322947 (95.4%) zeros Zeros

Reproduction

Analysis started2021-04-07 13:01:14.704497
Analysis finished2021-04-07 13:02:20.931285
Duration1 minute and 6.23 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

PaySanrenpukuNinki3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
4.0
18 
3.0
13 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters93
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row3.0
3rd row3.0
4th row3.0
5th row3.0
ValueCountFrequency (%)
4.018
 
< 0.1%
3.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
4.018
58.1%
3.013
41.9%

Most occurring characters

ValueCountFrequency (%)
.31
33.3%
031
33.3%
418
19.4%
313
14.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number62
66.7%
Other Punctuation31
33.3%

Most frequent character per category

ValueCountFrequency (%)
031
50.0%
418
29.0%
313
21.0%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common93
100.0%

Most frequent character per script

ValueCountFrequency (%)
.31
33.3%
031
33.3%
418
19.4%
313
14.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII93
100.0%

Most frequent character per block

ValueCountFrequency (%)
.31
33.3%
031
33.3%
418
19.4%
313
14.0%

PaySanrentanKumi1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3997
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean80047.02295
Minimum10203
Maximum181716
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum10203
5-th percentile11103
Q140902
median80401
Q3120106
95-th percentile151614
Maximum181716
Range171513
Interquartile range (IQR)79204

Descriptive statistics

Standard deviation44893.62243
Coefficient of variation (CV)0.5608406257
Kurtosis-1.025950167
Mean80047.02295
Median Absolute Deviation (MAD)39589
Skewness0.184188325
Sum2.710328159 × 1010
Variance2015437335
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20301369
 
0.1%
20104323
 
0.1%
40601318
 
0.1%
40502302
 
0.1%
10402302
 
0.1%
10503300
 
0.1%
10306296
 
0.1%
80709294
 
0.1%
40605289
 
0.1%
30602287
 
0.1%
Other values (3987)335512
99.1%
ValueCountFrequency (%)
10203225
0.1%
10204237
0.1%
10205167
< 0.1%
10206208
0.1%
10207190
0.1%
ValueCountFrequency (%)
18171616
 
< 0.1%
18171513
 
< 0.1%
18171456
< 0.1%
18171046
< 0.1%
18170914
 
< 0.1%

PaySanrentanPay1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct13586
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean158448.6548
Minimum430
Maximum27929360
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum430
5-th percentile3280
Q111350
median31970
Q3103710
95-th percentile637510
Maximum27929360
Range27928930
Interquartile range (IQR)92360

Descriptive statistics

Standard deviation570696.2805
Coefficient of variation (CV)3.601774223
Kurtosis462.8865925
Mean158448.6548
Median Absolute Deviation (MAD)25530
Skewness15.98290073
Sum5.364944692 × 1010
Variance3.256942446 × 1011
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4360198
 
0.1%
4700192
 
0.1%
5650184
 
0.1%
4450184
 
0.1%
4510180
 
0.1%
5170180
 
0.1%
5310178
 
0.1%
3220177
 
0.1%
4010176
 
0.1%
10370171
 
0.1%
Other values (13576)336772
99.5%
ValueCountFrequency (%)
43020
< 0.1%
4508
 
< 0.1%
4704
 
< 0.1%
48017
< 0.1%
5009
< 0.1%
ValueCountFrequency (%)
2792936010
< 0.1%
2294615012
< 0.1%
2180232010
< 0.1%
1950701012
< 0.1%
1299528013
< 0.1%

PaySanrentanNinki1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1894
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean256.3821709
Minimum1
Maximum4602
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile3
Q124
median93
Q3296
95-th percentile1102
Maximum4602
Range4601
Interquartile range (IQR)272

Descriptive statistics

Standard deviation415.8957262
Coefficient of variation (CV)1.622171014
Kurtosis14.27120811
Mean256.3821709
Median Absolute Deviation (MAD)83
Skewness3.257022375
Sum86808952
Variance172969.2551
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16985
 
2.1%
26565
 
1.9%
35353
 
1.6%
45081
 
1.5%
54638
 
1.4%
74146
 
1.2%
63846
 
1.1%
93748
 
1.1%
83681
 
1.1%
103632
 
1.1%
Other values (1884)290917
85.9%
ValueCountFrequency (%)
16985
2.1%
26565
1.9%
35353
1.6%
45081
1.5%
54638
1.4%
ValueCountFrequency (%)
460217
< 0.1%
447110
< 0.1%
407114
< 0.1%
385112
< 0.1%
378218
< 0.1%

PaySanrentanKumi2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct164
Distinct (%)8.2%
Missing336583
Missing (%)99.4%
Infinite0
Infinite (%)0.0%
Mean81173.19512
Minimum10307
Maximum181702
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum10307
5-th percentile11412.6
Q141503
median80410
Q3110805
95-th percentile151309
Maximum181702
Range171395
Interquartile range (IQR)69302

Descriptive statistics

Standard deviation42835.73805
Coefficient of variation (CV)0.5277079212
Kurtosis-0.7256813881
Mean81173.19512
Median Absolute Deviation (MAD)30395
Skewness0.2526079037
Sum163076949
Variance1834900455
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11080528
 
< 0.1%
11060226
 
< 0.1%
8021224
 
< 0.1%
13141518
 
< 0.1%
18101418
 
< 0.1%
8180418
 
< 0.1%
4141817
 
< 0.1%
7050417
 
< 0.1%
8160717
 
< 0.1%
2110717
 
< 0.1%
Other values (154)1809
 
0.5%
(Missing)336583
99.4%
ValueCountFrequency (%)
1030710
< 0.1%
104038
< 0.1%
1060713
< 0.1%
1060811
< 0.1%
1071114
< 0.1%
ValueCountFrequency (%)
18170217
< 0.1%
18101418
< 0.1%
17130816
< 0.1%
16130313
< 0.1%
16111212
< 0.1%

PaySanrentanPay2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct162
Distinct (%)8.1%
Missing336583
Missing (%)99.4%
Infinite0
Infinite (%)0.0%
Mean91717.59084
Minimum530
Maximum1738760
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum530
5-th percentile2790
Q17790
median23570
Q370150
95-th percentile474300
Maximum1738760
Range1738230
Interquartile range (IQR)62360

Descriptive statistics

Standard deviation221911.2627
Coefficient of variation (CV)2.419506015
Kurtosis24.90932199
Mean91717.59084
Median Absolute Deviation (MAD)17910
Skewness4.703103674
Sum184260640
Variance4.924460851 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
421042
 
< 0.1%
1028026
 
< 0.1%
3200021
 
< 0.1%
875018
 
< 0.1%
7343018
 
< 0.1%
7292018
 
< 0.1%
984017
 
< 0.1%
991017
 
< 0.1%
342017
 
< 0.1%
2429017
 
< 0.1%
Other values (152)1798
 
0.5%
(Missing)336583
99.4%
ValueCountFrequency (%)
5304
< 0.1%
5805
< 0.1%
5908
< 0.1%
6207
< 0.1%
8506
< 0.1%
ValueCountFrequency (%)
17387608
< 0.1%
140023014
< 0.1%
126827015
< 0.1%
76034016
< 0.1%
67608016
< 0.1%

PaySanrentanNinki2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct134
Distinct (%)6.7%
Missing336583
Missing (%)99.4%
Infinite0
Infinite (%)0.0%
Mean325.8760577
Minimum1
Maximum3404
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile6
Q137
median141
Q3369
95-th percentile1227
Maximum3404
Range3403
Interquartile range (IQR)332

Descriptive statistics

Standard deviation517.3715154
Coefficient of variation (CV)1.587632792
Kurtosis14.89011922
Mean325.8760577
Median Absolute Deviation (MAD)119
Skewness3.45410245
Sum654685
Variance267673.2849
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
644
 
< 0.1%
1344
 
< 0.1%
11339
 
< 0.1%
2130
 
< 0.1%
2229
 
< 0.1%
5328
 
< 0.1%
12728
 
< 0.1%
1727
 
< 0.1%
6826
 
< 0.1%
2826
 
< 0.1%
Other values (124)1688
 
0.5%
(Missing)336583
99.4%
ValueCountFrequency (%)
122
< 0.1%
213
 
< 0.1%
36
 
< 0.1%
424
< 0.1%
644
< 0.1%
ValueCountFrequency (%)
340416
< 0.1%
305316
< 0.1%
196315
< 0.1%
18058
< 0.1%
156715
< 0.1%

PaySanrentanKumi3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
81811.0
18 
60514.0
13 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters217
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row60514.0
2nd row60514.0
3rd row60514.0
4th row60514.0
5th row60514.0
ValueCountFrequency (%)
81811.018
 
< 0.1%
60514.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
81811.018
58.1%
60514.013
41.9%

Most occurring characters

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
613
 
6.0%
513
 
6.0%
413
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number186
85.7%
Other Punctuation31
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
167
36.0%
044
23.7%
836
19.4%
613
 
7.0%
513
 
7.0%
413
 
7.0%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common217
100.0%

Most frequent character per script

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
613
 
6.0%
513
 
6.0%
413
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII217
100.0%

Most frequent character per block

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
613
 
6.0%
513
 
6.0%
413
 
6.0%

PaySanrentanPay3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
5410.0
18 
4620.0
13 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters186
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4620.0
2nd row4620.0
3rd row4620.0
4th row4620.0
5th row4620.0
ValueCountFrequency (%)
5410.018
 
< 0.1%
4620.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
5410.018
58.1%
4620.013
41.9%

Most occurring characters

ValueCountFrequency (%)
062
33.3%
431
16.7%
.31
16.7%
518
 
9.7%
118
 
9.7%
613
 
7.0%
213
 
7.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number155
83.3%
Other Punctuation31
 
16.7%

Most frequent character per category

ValueCountFrequency (%)
062
40.0%
431
20.0%
518
 
11.6%
118
 
11.6%
613
 
8.4%
213
 
8.4%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common186
100.0%

Most frequent character per script

ValueCountFrequency (%)
062
33.3%
431
16.7%
.31
16.7%
518
 
9.7%
118
 
9.7%
613
 
7.0%
213
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII186
100.0%

Most frequent character per block

ValueCountFrequency (%)
062
33.3%
431
16.7%
.31
16.7%
518
 
9.7%
118
 
9.7%
613
 
7.0%
213
 
7.0%

PaySanrentanNinki3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
14.0
18 
26.0
13 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters124
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row26.0
2nd row26.0
3rd row26.0
4th row26.0
5th row26.0
ValueCountFrequency (%)
14.018
 
< 0.1%
26.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
14.018
58.1%
26.013
41.9%

Most occurring characters

ValueCountFrequency (%)
.31
25.0%
031
25.0%
118
14.5%
418
14.5%
213
10.5%
613
10.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number93
75.0%
Other Punctuation31
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
031
33.3%
118
19.4%
418
19.4%
213
14.0%
613
14.0%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common124
100.0%

Most frequent character per script

ValueCountFrequency (%)
.31
25.0%
031
25.0%
118
14.5%
418
14.5%
213
10.5%
613
10.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII124
100.0%

Most frequent character per block

ValueCountFrequency (%)
.31
25.0%
031
25.0%
118
14.5%
418
14.5%
213
10.5%
613
10.5%

YoubiCD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.645511412
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum8
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7493408244
Coefficient of variation (CV)0.455384763
Kurtosis11.88427302
Mean1.645511412
Median Absolute Deviation (MAD)1
Skewness2.238450502
Sum557157
Variance0.5615116711
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
2157548
46.5%
1155859
46.0%
319804
 
5.8%
42958
 
0.9%
51015
 
0.3%
7837
 
0.2%
8299
 
0.1%
6272
 
0.1%
ValueCountFrequency (%)
1155859
46.0%
2157548
46.5%
319804
 
5.8%
42958
 
0.9%
51015
 
0.3%
ValueCountFrequency (%)
8299
 
0.1%
7837
 
0.2%
6272
 
0.1%
51015
 
0.3%
42958
0.9%

TokuNum
Real number (ℝ≥0)

ZEROS

Distinct141
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.137808926
Minimum0
Maximum212
Zeros322947
Zeros (%)95.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum212
Range212
Interquartile range (IQR)0

Descriptive statistics

Standard deviation22.77716081
Coefficient of variation (CV)5.504642968
Kurtosis39.92066535
Mean4.137808926
Median Absolute Deviation (MAD)0
Skewness6.1903803
Sum1401029
Variance518.7990546
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0322947
95.4%
19183
 
0.1%
11181
 
0.1%
14179
 
0.1%
1176
 
0.1%
18171
 
0.1%
7162
 
< 0.1%
13162
 
< 0.1%
92161
 
< 0.1%
78160
 
< 0.1%
Other values (131)14110
 
4.2%
ValueCountFrequency (%)
0322947
95.4%
1176
 
0.1%
2134
 
< 0.1%
3136
 
< 0.1%
4117
 
< 0.1%
ValueCountFrequency (%)
21214
 
< 0.1%
21115
 
< 0.1%
21014
 
< 0.1%
20942
< 0.1%
20876
< 0.1%

Nkai
Real number (ℝ≥0)

ZEROS

Distinct139
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.106169077
Minimum0
Maximum162
Zeros322947
Zeros (%)95.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum162
Range162
Interquartile range (IQR)0

Descriptive statistics

Standard deviation11.34855447
Coefficient of variation (CV)5.38824475
Kurtosis58.02974961
Mean2.106169077
Median Absolute Deviation (MAD)0
Skewness6.862258032
Sum713132
Variance128.7896885
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0322947
95.4%
53375
 
0.1%
55359
 
0.1%
63342
 
0.1%
17342
 
0.1%
52325
 
0.1%
54325
 
0.1%
18310
 
0.1%
56309
 
0.1%
58303
 
0.1%
Other values (129)12655
 
3.7%
ValueCountFrequency (%)
0322947
95.4%
186
 
< 0.1%
290
 
< 0.1%
3102
 
< 0.1%
480
 
< 0.1%
ValueCountFrequency (%)
16212
< 0.1%
16113
< 0.1%
16016
< 0.1%
15913
< 0.1%
15811
< 0.1%

GradeCD
Categorical

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing250525
Missing (%)74.0%
Memory size2.6 MiB
E
71024 
C
 
6880
B
 
4166
A
 
3394
L
 
1398
Other values (4)
 
1205

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters88067
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowE
2nd rowE
3rd rowE
4th rowE
5th rowE
ValueCountFrequency (%)
E71024
 
21.0%
C6880
 
2.0%
B4166
 
1.2%
A3394
 
1.0%
L1398
 
0.4%
H513
 
0.2%
G337
 
0.1%
F234
 
0.1%
D121
 
< 0.1%
(Missing)250525
74.0%
Histogram of lengths of the category
ValueCountFrequency (%)
e71024
80.6%
c6880
 
7.8%
b4166
 
4.7%
a3394
 
3.9%
l1398
 
1.6%
h513
 
0.6%
g337
 
0.4%
f234
 
0.3%
d121
 
0.1%

Most occurring characters

ValueCountFrequency (%)
E71024
80.6%
C6880
 
7.8%
B4166
 
4.7%
A3394
 
3.9%
L1398
 
1.6%
H513
 
0.6%
G337
 
0.4%
F234
 
0.3%
D121
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter88067
100.0%

Most frequent character per category

ValueCountFrequency (%)
E71024
80.6%
C6880
 
7.8%
B4166
 
4.7%
A3394
 
3.9%
L1398
 
1.6%
H513
 
0.6%
G337
 
0.4%
F234
 
0.3%
D121
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin88067
100.0%

Most frequent character per script

ValueCountFrequency (%)
E71024
80.6%
C6880
 
7.8%
B4166
 
4.7%
A3394
 
3.9%
L1398
 
1.6%
H513
 
0.6%
G337
 
0.4%
F234
 
0.3%
D121
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII88067
100.0%

Most frequent character per block

ValueCountFrequency (%)
E71024
80.6%
C6880
 
7.8%
B4166
 
4.7%
A3394
 
3.9%
L1398
 
1.6%
H513
 
0.6%
G337
 
0.4%
F234
 
0.3%
D121
 
0.1%

SyubetuCD
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.73259557
Minimum11
Maximum19
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum11
5-th percentile11
Q112
median13
Q313
95-th percentile14
Maximum19
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.480650616
Coefficient of variation (CV)0.1162881996
Kurtosis5.517360522
Mean12.73259557
Median Absolute Deviation (MAD)1
Skewness1.862398938
Sum4311155
Variance2.192326245
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
12108174
31.9%
1389410
26.4%
1470494
20.8%
1158656
17.3%
186697
 
2.0%
195161
 
1.5%
ValueCountFrequency (%)
1158656
17.3%
12108174
31.9%
1389410
26.4%
1470494
20.8%
186697
 
2.0%
ValueCountFrequency (%)
195161
 
1.5%
186697
 
2.0%
1470494
20.8%
1389410
26.4%
12108174
31.9%

KigoCD
Categorical

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
A03
120825 
003
81074 
023
40734 
A04
25424 
A00
25137 
Other values (26)
45398 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1015776
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA03
2nd rowA04
3rd rowA04
4th rowA03
5th rowA03
ValueCountFrequency (%)
A03120825
35.7%
00381074
23.9%
02340734
 
12.0%
A0425424
 
7.5%
A0025137
 
7.4%
00011795
 
3.5%
N046258
 
1.8%
N015979
 
1.8%
0205503
 
1.6%
N033000
 
0.9%
Other values (21)12863
 
3.8%
Histogram of lengths of the category
ValueCountFrequency (%)
a03120825
35.7%
00381074
23.9%
02340734
 
12.0%
a0425424
 
7.5%
a0025137
 
7.4%
00011795
 
3.5%
n046258
 
1.8%
n015979
 
1.8%
0205503
 
1.6%
n033000
 
0.9%
Other values (21)12863
 
3.8%

Most occurring characters

ValueCountFrequency (%)
0472178
46.5%
3246900
24.3%
A173457
 
17.1%
254578
 
5.4%
437131
 
3.7%
N20881
 
2.1%
110058
 
1.0%
M593
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number820845
80.8%
Uppercase Letter194931
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
0472178
57.5%
3246900
30.1%
254578
 
6.6%
437131
 
4.5%
110058
 
1.2%
ValueCountFrequency (%)
A173457
89.0%
N20881
 
10.7%
M593
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common820845
80.8%
Latin194931
 
19.2%

Most frequent character per script

ValueCountFrequency (%)
0472178
57.5%
3246900
30.1%
254578
 
6.6%
437131
 
4.5%
110058
 
1.2%
ValueCountFrequency (%)
A173457
89.0%
N20881
 
10.7%
M593
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1015776
100.0%

Most frequent character per block

ValueCountFrequency (%)
0472178
46.5%
3246900
24.3%
A173457
 
17.1%
254578
 
5.4%
437131
 
3.7%
N20881
 
2.1%
110058
 
1.0%
M593
 
0.1%

JyuryoCD
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
3
162725 
4
138594 
1
20984 
2
16289 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row4
4th row1
5th row1
ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%
Histogram of lengths of the category
ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%

Most occurring characters

ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
3162725
48.1%
4138594
40.9%
120984
 
6.2%
216289
 
4.8%

Kyori
Real number (ℝ≥0)

HIGH CORRELATION

Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1655.659082
Minimum1000
Maximum4260
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1000
5-th percentile1200
Q11400
median1600
Q31800
95-th percentile2400
Maximum4260
Range3260
Interquartile range (IQR)400

Descriptive statistics

Standard deviation428.5184892
Coefficient of variation (CV)0.2588204865
Kurtosis3.330046133
Mean1655.659082
Median Absolute Deviation (MAD)200
Skewness1.406640369
Sum560592920
Variance183628.0956
MonotocityNot monotonic
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
180078345
23.1%
120069466
20.5%
140055219
16.3%
160044069
13.0%
200029701
 
8.8%
170013212
 
3.9%
24007402
 
2.2%
22006591
 
1.9%
10006436
 
1.9%
19003776
 
1.1%
Other values (39)24375
 
7.2%
ValueCountFrequency (%)
10006436
 
1.9%
11501408
 
0.4%
120069466
20.5%
13003365
 
1.0%
140055219
16.3%
ValueCountFrequency (%)
426011
 
< 0.1%
4250110
< 0.1%
4100113
< 0.1%
3930116
< 0.1%
3900133
< 0.1%

KyoriBefore
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
338582 
2000
 
10

Length

Max length4
Median length1
Mean length1.000088602
Min length1

Characters and Unicode

Total characters338622
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0338582
> 99.9%
200010
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0338582
> 99.9%
200010
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0338612
> 99.9%
210
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338622
100.0%

Most frequent character per category

ValueCountFrequency (%)
0338612
> 99.9%
210
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338622
100.0%

Most frequent character per script

ValueCountFrequency (%)
0338612
> 99.9%
210
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338622
100.0%

Most frequent character per block

ValueCountFrequency (%)
0338612
> 99.9%
210
 
< 0.1%

TrackCD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.90511294
Minimum10
Maximum57
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum10
5-th percentile11
Q117
median23
Q324
95-th percentile24
Maximum57
Range47
Interquartile range (IQR)7

Descriptive statistics

Standard deviation7.666498496
Coefficient of variation (CV)0.3667283941
Kurtosis8.530745825
Mean20.90511294
Median Absolute Deviation (MAD)1
Skewness2.30592168
Sum7078304
Variance58.77519919
MonotocityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
24119996
35.4%
1765207
19.3%
2351257
15.1%
1144554
 
13.2%
1840557
 
12.0%
526182
 
1.8%
544362
 
1.3%
123634
 
1.1%
101394
 
0.4%
56997
 
0.3%
Other values (3)452
 
0.1%
ValueCountFrequency (%)
101394
 
0.4%
1144554
13.2%
123634
 
1.1%
1765207
19.3%
1840557
12.0%
ValueCountFrequency (%)
5794
 
< 0.1%
56997
 
0.3%
55223
 
0.1%
544362
1.3%
526182
1.8%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PaySanrenpukuNinki3PaySanrentanKumi1PaySanrentanPay1PaySanrentanNinki1PaySanrentanKumi2PaySanrentanPay2PaySanrentanNinki2PaySanrentanKumi3PaySanrentanPay3PaySanrentanNinki3YoubiCDTokuNumNkaiGradeCDSyubetuCDKigoCDJyuryoCDKyoriKyoriBeforeTrackCD
0nan1615051549027nannannannannannan100E13A0311200017
1nan8090141303nannannannannannan100E13A0441200017
2nan4080531803nannannannannannan200E13A0441200017
3nan1516067314801833nannannannannannan300E14A0311200011
4nan1615051549027nannannannannannan100E13A0311200017
5nan9121644600136nannannannannannan200E13A0441200017
6nan10812310660672nannannannannannan100E13A0341200017
7nan1606072343055nannannannannannan100NaN1400341200017
8nan14161230151102280nannannannannannan200E1400011200017
9nan14081163304nannannannannannan200NaN1400341200011

Last rows

PaySanrenpukuNinki3PaySanrentanKumi1PaySanrentanPay1PaySanrentanNinki1PaySanrentanKumi2PaySanrentanPay2PaySanrentanNinki2PaySanrentanKumi3PaySanrentanPay3PaySanrentanNinki3YoubiCDTokuNumNkaiGradeCDSyubetuCDKigoCDJyuryoCDKyoriKyoriBeforeTrackCD
338582nan2050843407nannannannannannan100E11M0121200017
338583nan2050843407nannannannannannan100E11M0121200017
338584nan2130183220263nannannannannannan200NaN1202331200017
338585nan81506373540900nannannannannannan200NaN1202332000017
338586nan402121452028nannannannannannan200NaN13A0341000024
338587nan10080139160127nannannannannannan200NaN1300441200017
338588nan1006021383034nannannannannannan200E13N0121800017
338589nan7140665009nannannannannannan200NaN11A0331200017
338590nan50904280990613nannannannannannan200NaN1200332000017
338591nan100914173200453nannannannannannan200NaN11A0331800017

Duplicate rows

Most frequent

PaySanrenpukuNinki3PaySanrentanKumi1PaySanrentanPay1PaySanrentanNinki1PaySanrentanKumi2PaySanrentanPay2PaySanrentanNinki2PaySanrentanKumi3PaySanrentanPay3PaySanrentanNinki3YoubiCDTokuNumNkaiGradeCDSyubetuCDKigoCDJyuryoCDKyoriKyoriBeforeTrackCDcount
03.006050268805160,508.0013,310.00109.0060,514.004,620.0026.00100E13A041120002413